Skip to content

patches from Duncan#5

Closed
hsenag wants to merge 3 commits into
masterfrom
dcoutts-20110724
Closed

patches from Duncan#5
hsenag wants to merge 3 commits into
masterfrom
dcoutts-20110724

Conversation

@hsenag

@hsenag hsenag commented Aug 13, 2011

Copy link
Copy Markdown
Member

These patches were submitted by Duncan Coutts by email. I've turned them into a pull request for tracking purposes.

About the first two:

The first is to fix the support for HTTP digest authentication. The code
seems to do everything right except that it misformats the md5 digests.
(This is slightly odd because it cannot ever have worked like this.
Presumably someone tested it previously and then the md5 formatting got
broken later.)

Credit for this one goes to Matthew Gruen. We've tested this new code
against happstack, but not against e.g. apache (though we do know the
happstack server-side digest code works with various standard web
browsers).

We will be relying on this first fix in future so that cabal-install
(and all other Haskell clients like the mirroring client) can
authenticate with the hackage-server.

The second fix is for the connection pool logic in the Browser
module/monad.

The browser monad keeps its pool of open connections so that it can
reuse connections rather than creating new ones. Obviously it is
important that when it tries to reuse a connection, that it picks a
correct one.

Currently the connections in the pool are only identified by the
hostname in the URL, not the combinator of hostname and port number.
This means that if you have open connections for say, localhost and
localhost:8080 then when connections are reused it'll pick one of these
arbitrarily which can obviously lead to talking to the wrong server and
failure.

Internally the HandleStream records the host name and isTCPConnectedTo
looks at that.

[ Generally the connection pool logic is a bit odd, instead of filtering
by isTCPConnectedTo, I'd have thought it more sensible to keep a finite
map from (host, port) to connection. isTCPConnectedTo also checks (in an
odd way) that the connection is still open. If it turns out it has been
closed, the connection is not removed from the pool, rather a new
connection is opened instead and the old connections eventually makes
its way to the back of the connection list, which means in the mean time
we could well end up expiring and closing perfectly good connections. I
realise of course you didn't write this mess ]

I've fixed it by instead of relying on the hostname recorded inside the
HandleStream via isTCPConnectedTo, keeping the hostname and port number
in the connection pool itself in the Browser. So then when picking a
connection we filter by the hostname and port number and finally check
if the connection is still connected.

  • conn <- ioAction $ filterM (\c -> c isTCPConnectedTo uriAuthToString hst) pool
  • conn <- ioAction $ filterM isTCPConnected
  •                         [ c | (hostname,portnum, c) <- pool
    
  •                             , hostname == uriRegName hst
    
  •                             , portnum  == uriAuthPort Nothing hst ]
    

To do that I've introduced variants of isTCPConnectedTo and
isConnectedTo that just test if they're connected and don't take any
hostname as a parameter. As far as I can see both isTCPConnectedTo and
isConnectedTo are pretty pointless functions so I've also taken the
liberty of deprecating them (the module they're in is mostly internal
anyway).

I've tested this fix with my hackage-mirror client, mirroring between
localhost and localhost:8080.

And about the third:

Oh, and another one!

This is a fix to Network/Browser.hs to return 304 "not modified"
responses directly and not to treat them as a redirection (since it's
not a redirection).

The 304 code is returned by servers in response to conditional GET/PUT
requests (e.g. GET met this resource, but only if it changed more
recently than time t), to say that the resource has not changed. This
conditional stuff is needed to validate client-side caches. I'm now
using this in the hackage-mirror and it would be nice to have it in
cabal-install too, as it'd make cabal update a bit cheaper.

You may notice that this partially reverts a patch from aslatter from
last year. He changed:

  •  (3,0,x) | x `elem` [2,3,1,7]  ->  do
    
  •  (3,0,x) | x /= 5  ->  do
    

and I've changed it back

  •  (3,0,x) | x /= 5  ->  do
    
  •  (3,0,x) | x `elem` [2,3,1,7]  ->  do
    

This isn't really the important part of the patch, the important bit is:

  •  (3,_,_) -> redirect uri rsp
    
    _ -> return (Right (uri,rsp))

http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection

What is going on is that there are four "30x" response codes that should
be treated as redirect: 301, 302, 303 and 307.

There are a few other "30x" codes which are not redirects, or not
standard redirects: 300, 304, 305 and 306. The current code handles 305
explicitly. The current code treats 300 as a redirect too, using the
catch-all case:

  (3,_,_) -> redirect uri rsp

While the HTTP spec says that clients MAY treat 300 as a redirect,
that's only if the server provides a location, which is optional for
300. The current redirect code always expects a location, so it's been
handling 300 wrong anyway (300 is almost never used in practice). So I
decided that instead of preserving the current wrong behaviour for 300,
it's better to just return it to the client.

Finally, that leaves 304 and 306.
* 306 is explicitly not used (says so in the spec).
* 304 is not a redirect, it's the answer to conditional GET/PUT
when the resource has not changed. So again it must go to the
caller, and not be treated as a redirect (it has no location
header so the current code fails)

The current code treats all other unknown 3xx codes as redirects and
looks for a location header. I think this is not a safe thing to do, we
cannot predict the behaviour of any newly allocated 3xx codes, so we
should just return them to the caller.

This simplifies the logic because it means we only match the ones we
explicitly handle, and return the others to the caller. Because we no
longer have the catch-all redirect case, then the 'redirect' function is
dead code.

Clear as mud?

It should be using hex encoding, not raw bytes.
Previously it would confuse localhost and localhost:8000 and
so would sometimes reuse a connection to the wrong server.
That is, do not treat it as a redirect, just return the response.
This code is returned by servers in response to conditional GET/PUT
requests. Such conditional requests are essential for proper
client-side caching.
@batterseapower

Copy link
Copy Markdown
Contributor

Please merge this! I've also been bitten by the 304 redirect issue.

@hsenag

hsenag commented Sep 27, 2011

Copy link
Copy Markdown
Member Author

On 27/09/2011 08:20, Max Bolingbroke wrote:

Please merge this! I've also been bitten by the 304 redirect issue.

I'll merge it as soon as it has tests - which are in my own queue to
write but obviously I wouldn't say no if someone else did it :-)

Ganesh

@batterseapower

Copy link
Copy Markdown
Contributor

Laudable, but it's not like the existing code has any tests. So merging this would just swap one set of untested behaviour for another - and what is the harm in that? :^)

@hsenag

hsenag commented Sep 28, 2011

Copy link
Copy Markdown
Member Author

On 27/09/2011 23:24, Max Bolingbroke wrote:

Laudable, but it's not like the existing code has any tests. So merging this would just swap one set of untested behaviour for another - and what is the harm in that? :^)

Because then there's nothing to stop the current situation persisting
indefinitely. If I insist on changes being tested, we end up with
something more maintainable, albeit that improvements happen more slowly.

Ganesh

@hsenag

hsenag commented Oct 6, 2011

Copy link
Copy Markdown
Member Author

I've now fixed the connection pool problem a different way.

@hsenag

hsenag commented Oct 7, 2011

Copy link
Copy Markdown
Member Author

I've now added tests for redirection and cherry-picked the patch for that.

@batterseapower

Copy link
Copy Markdown
Contributor

Thank you!

@hsenag

hsenag commented Oct 9, 2011

Copy link
Copy Markdown
Member Author

authentication tests added and the digest authentication fix cherry-picked

@hsenag hsenag closed this Oct 9, 2011
@aslatter

aslatter commented Nov 2, 2011

Copy link
Copy Markdown
Contributor

I haven't dug through this patch, but the comment mentions that it reverts one of my old commits -

The goal of the old one was to stop some redirects on POST from being issued as a GET to the new location - at minimum a 307 in response to a POST should not be redirected with a GET, and I think my patch may have made the same change for 302, I don't remember.

I'll try to remember to test this out with HEAD.

@aslatter

aslatter commented Nov 2, 2011

Copy link
Copy Markdown
Contributor

Nevermind my concern - the part of the old commit which only used GET for some redirects survived this commit :-)

@hsenag

hsenag commented Nov 3, 2011

Copy link
Copy Markdown
Member Author

I wrote some tests with reference to the spec and also checked your original email/patch before applying Duncan's patch - the tests that were relevant to your patch kept working before and after Duncan's changes so I'm fairly sure nothing relevant was broken by that.

@hsenag

hsenag commented Nov 3, 2011

Copy link
Copy Markdown
Member Author

Except I think we don't yet have any tests for POST so that may be a bit more "at risk".

@hsenag hsenag deleted the dcoutts-20110724 branch August 24, 2014 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants